I think the single biggest insight of the past 2 days is the dnorm() does not output a probability value that x will take on a particular value. Rather dnorm(
) outputs the height of the density curve at a particular value of x. This height can be used later to calculate a probability for a range of x values (e.g., from -1 to 1), but in the short term dnorm() just outputs a y-value for each x within a normal distribution with a given mean and sd. Narrower sd leads to higher y values near the mean as a greater density of probability will be concentrated near the mean for distributions with little variance.
library(knitr) library(tidyverse) source(here::here("file_paths.R")) file_normal_pdf <- paste(dir_images, "normal_pdf.png", sep = "/")
runif(n, min, max)
dunif(x, min, max)
will output the y-axis height at x of the uniform distribution between min
and max
.
dunif(x = c(.25, .5), min = 0, max = 1)
But note that this y-axis output from the uniform density function
is not the probability that the random variable will equal x, it is just the height at the y-axis when the random variable equals x. Further computation is necessary to determine probabilities of x taking on specified values, and those values must be specified as a range when x is continuous.
Note that the y-axis height associated with any x value is equal to 1 / (the range of the uniform distribution).
dunif(x = 10, min = 0, max = 120) dunif(x = 10.5, min = 0, max = 120) 1/120
a <- 0 b <- 0.5 punif(b) - punif(a)
# Probability computed using the integrate() function of the pdf dunif() integrate( f = dunif, lower = 0.1, upper = 0.5 )
First Google hit with code
U of Arizona site says that dunif() is really only useful for graphing, not for directly calculating probabilities.
In his example below I see that dunif()
works even though x is undefined.
curve( expr = dunif(x , min = 2 , max = 6), from = 0, to = 8, ylim = c(0 , 0.5), ylab = "f(x)", main = "Uniform Density f(x)" )
tibble(x = seq(from = 0, to = 100, by = .1)) %>% ggplot(aes(x = x, y = dnorm(x, mean = 50, sd = 15))) + geom_line() + ylab("density")
Here dnorm()
takes the 150^10 data points in x
and distributes probability across all those points such that plotting y on x produces a normal distribution.
x <- seq(from = 0, to = 100, by = 1) dnorm(x, 50, 15) dnorm(x, 50, 15) %>% length() # Notice how the sd of the distribution increases the max y value. dnorm(x, 50, 15) %>% max() dnorm(x, 50, 1) %>% max()
The above code demonstrates how there are several moving parts of interest.
x
is just a vector of values. dnorm()
will tell us the y-value associated with each of these x-values on a probability curve with a given mean and sd. Narrower sd leads to higher y values near the mean as a greater density of probability will be concentrated near the mean for distributions with little variance.
tibble(x = seq(from = 0, to = 100, by = 1)) %>% ggplot(aes(x = x, y = dnorm(x, mean = 50, sd = 15))) + geom_line() + ylab("density")
knitr::include_graphics(file_normal_pdf)
The probability density function (PDF)
outputs the height of the y-axis on the probability density curve at the provided x value. The PDF does NOT output the probability that a random variable will take on a particular value of x. To calculate the probability of x taking on a values can only be calculated as x falling within a range of values, because the probabillity of a continuous random variable taking on a specific value is 1/inf.
In contrast, the binomial probability distribution
has a probability mass function (PMF)
rather than a density function because it's outputs are discrete.
# outputs Pr(1 head in 1 flip) success_n <- 1 trials_n <- 1 prob_success <- .5 dbinom(success_n, trials_n, prob_success) # outputs probability of 5 heads in 10 flips success_n <- 5 trials_n <- 10 prob_success <- .5 dbinom(success_n, trials_n, prob_success)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.